Trainable speech synthesis with trended hidden Markov models
نویسندگان
چکیده
In this paper we present a trainable speech synthesis system that uses the trended Hidden Markov Model to generate the trajectories of spectral features of synthesis units. The synthesis units are trained from a transcribed continuous speech corpus, making the speech more natural than that produced by conventional diphone synthesisers which are generally trained from a highly articulated speech database and require a large investment of time and effort in order to train a new voice. The overall system has been incorporated into a PSOLA synthesiser to produce speech that is natural sounding and preserves the identity of the source speaker.
منابع مشابه
Application of the trended hidden Markov model to speech synthesis
This paper presents our work on a speech synthesis system that utilises the trended Hidden Markov Model to represent the basic synthesis unit. We draw upon both speech recognition and speech synthesis research to develop a system that is able to synthesise intelligible and natural sounding speech. Acoustic units are clustered using the decision tree technique and speech data corresponding to th...
متن کاملCurrent status of the IBM Trainable Speech Synthesis System
This paper describes the current status of the IBM Trainable Speech Synthesis System. The system is a state-of-the-art, trainable, unit-selection based concatenative speech synthesiser. The system uses hidden Markov models (HMMs) to provide a phonetic transcription and HMM state alignment of a database of single-speaker continuous-speech training data. The runtime synthesiser uses the HMM state...
متن کاملSpeaker Independent Speech Recognition Using Hidden Markov Models for Persian Isolated Words
متن کامل
The IBM trainable speech synthesis system
The speech synthesis system described in this paper uses a set of speaker-dependent decision-tree state-clustered hidden Markov models to automatically generate a leaf level segmentation of a large single-speaker continuous-read-speech database. During synthesis, the phone sequence to be synthesised is converted to an acoustic leaf sequence by descending the HMM decision trees. Duration, energy...
متن کامل